Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis

نویسندگان

چکیده

Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 15 October 2020Accepted: 16 August 2021Published online: 28 2021Keywordsmean-field control, multi-agent reinforcement learning, Q-learning, cooperative games, dynamic programming principleAMS Subject Headings49N80, 68Q32, 68T05, 90C40Publication DataISSN (online): 2577-0187Publisher: Society for Industrial and Applied MathematicsCODEN: sjmdaq

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fastest Convergence for Q-learning

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins’ original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-s...

متن کامل

Cooperative Mean-Field Type Games

In the standard formulation of a game, a player’s payoff function depends on the states and actions of all the players. Yet, real world applications suggest to consider also a functional of the probability measure of states and actions of all the players. In this paper, we consider cooperative mean-field type games in which the state dynamics and the payoffs depend not only on the state and act...

متن کامل

On Mean Field Convergence and Stationary Regime

Assume that a family of stochastic processes on some Polish space E converges to a deterministic process; the convergence is in distribution (hence in probability) at every fixed point in time. This assumption holds for a large family of processes, among which many mean field interaction models and is weaker than previously assumed. We show that any limit point of an invariant probability of th...

متن کامل

Expertness based cooperative Q-learning

By using other agents' experiences and knowledge, a learning agent may learn faster, make fewer mistakes, and create some rules for unseen situations. These benefits would be gained if the learning agent can extract proper rules from the other agents' knowledge for its own requirements. One possible way to do this is to have the learner assign some expertness values (intelligence level values) ...

متن کامل

Convergence of Optimistic and Incremental Q-Learning

Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SIAM journal on mathematics of data science

سال: 2021

ISSN: ['2577-0187']

DOI: https://doi.org/10.1137/20m1360700